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ABSTRACT 


In the mortality data, a lower order polynomial does not provide a good fit especially in case of age interval. A possible 
approach to get a good fit is to increase the order of the polynomial. The higher order polynomial works well for mortality 
data with age interval of five years and suitable when mortality data for single year is not available. We use the 
polynomial regression model in one explanatory variable to fit mortality data where mortalities are available in age 
interval. In case of the higher order polynomials, the problem of multicollinearity is resolved by centering explanatory 
variable. We observe from the fitting that the polynomial regression model is very good approximation for all the three 
heterogeneous subpopulations. For all the subpopulations (male & female, rural male & female and urban male & female) 
polynomial approximation is the simplest suitable choice of fitting model to mortality data. Using the estimated mortality 
values by age interval, other columns of life tables are constructed. Life expectancies for these subpopulations are 


presented in the tables. 
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INTRODUCTION 


There is a vast literature on stochastic modelling of mortality rates. The term stochastic model refers to the model which 
expresses mortality-related quantities such as the mortality rate, the force of mortality or the number of survivors as 
functions of age. Consequently, these models describe the age-specific mortality rates of certain cohorts or certain periods. 
The age and time dependent models of mortality are described by Lee and Carter (1992). Many authors have contributed to 
modelling age-specific mortality rates (Lee and Miller, 2001; Booth et al., 2002; Brouhns et al., 2002; Renshaw and 
Haberman, 2003; Currie et al., 2004; Currie, 2006; Renshaw and Haberman, 2006 and Cairns et al., 2006). In these papers 
a mortality model was proposed aiming at combining the characteristics from or eliminating the disadvantages of the 
existing models. Stochastic mortality models either model the central mortality rate or the initial mortality rate (Coughlan 
et al., 2007). Due to the increasing focus on risk management and measurement for insurers and pension funds, the 


literature on stochastic mortality models has developed rapidly during the last decennium (Booth and tickle, 2008). 


Stochastic models that express mortality as a function of age, or those that take into account both the age and time 
dependency of mortality, can use various assumptions to model out all possible observed characteristics of mortality 
patterns. One of the most commonly used considerations is that the population is heterogeneous and composed of several 


subpopulations having different mortality dynamics (Rossolini and Piantanelli, 2001). Tabeau et al. (2001) and Vaupel 
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(2005) presented the age-dependent mortality models which are separated into groups and expressed by polynomial and 
non-polynomial functions (see also Keyfitz and Littman, 1979 and Keyfitz, 1984). Brouhns et al., (2002) described a fitting 
methodology for the Lee-Carter model based on a Poisson model. The main advantage of this is that it accounts for 
heteroskedasticity of the mortality data for different ages. The authors contributed towards this are Renshaw and 
Haberman, (2003, 2006); Cairns et al. (2007) and Cairns et al. (2011). Many functions can be approximated by 
polynomials using Taylor expansions (Vaupel, 2010; Tomas, 2012) and this gives an advantage in modelling mortality- 


related data. 


The mortality patterns in human populations reflect biological, social and medical factors affecting our lives, and 
stochastic modelling is an important tool for the analysis of these patterns. It is known that the mortality rate in all human 
populations increases with age after sexual maturity. This increase is predominantly exponential and satisfies the Gompertz 
pattern. Although the exponential growth of mortality rates is observed over a wide range of ages, it excludes early and 
late-life intervals. In a model of heterogeneous populations Avraam, et al. (2015) studied how differences in parameters of 


the Gompertz equation and described different subpopulations accounting for mortality dynamics at different ages. 
MORTALITY IN HETEROGENEOUS POPULATIONS 


In the present paper, we apply polynomial regression model to fit the mortality data of various subpopulations. The Least 
Squares (LS) method is used to estimate the free (unknown) parameters that minimize the sum of squared residuals. The 
LS method overestimates mortality at early life and underestimates it at old ages when simple functions such as the 
Gompertz, Makeham or Weibull are used to fit the mortality rates. The LS method is favoured over many other methods 


because of its simplicity (provided the assumptions of regression model are satisfied). 


Each human population can be observed as number of subpopulations which differ genetically (sex-male and 
female) and/or by place of residence (rural and urban) and/or ethnicity (black, white, etc). Therefore, we can model the 
whole heterogeneous population in the following way. We consider a population consisting of two heterogeneous 
subpopulations (males & females, rural males & females and urban males & females). The mortality of each subpopulation 
is described by polynomial regression with parameters specific to the subpopulation. Figures (a-j) show the differences in 
mortality rate among the subpopulations. The mortality for all the subpopulations increases at young ages, decreases for a 


short age interval and then increases again for old ages. 
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Figures (a-b): Differences in the Observed Age-Specific Mortality Rates (in 
log scales) for (a) India Rural and Urban Female Population between the 
Years 2012-16 and (b) India Rural and Urban Male Population between the 
Years 2012-16. 
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Plot of indian Rural-Urban total Population Plot of age-specific mortality rates for males and females 
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Figures (c-d): Differences in the Observed Age-Specific Mortality Rates 


(in log scales) for (c) India Rural and Urban Total Population between the 
Years 2012-16 and (d) India Males and Females between the Years 2000 and 
2015. 
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Figures (e-f): Comparison of Observed Age-Specific Mortality Rates 


(in log scales) for (e) India Males and Females of the Year 2000 and (f) 
India Males and Females of the Year 2005. 
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(g) (h) 
Figures (g-h): Comparison of Observed Age-Specific Mortality Rates (in log scales) 
for (g) India Males and Females of the Year 2010 and (h) India Males and Females of 
the Year 2015. 
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Figures (i-j): Differences in the Observed Age-Specific Mortality Rates (in log scales) 
for (i) India Females and (j) India Males during the Years 2000, 2005, 2010 and 2015. 


(Source:http://www.censusindia. gov.in/VitalStatistics/SRSLifeTable/SrslifeTable201 
2-16.html. https://www.who.int/healthinfo/mortality_data/en/) 

Logarithms of the mortality rates for India males, females, rural and urban are used for the analysis. Figures (i-j) 
show the mortality pattern of females and males for four years (2000, 2005, 2010 and 2015). The similarity in shape over 
the years of both the male and female mortality curves is the general downward trend of mortality. Also some changes over 
time in the shape of the curves can be observed particularly the increasing importance of the accident hump, the sharp 


increase in mortality at ages 15-19, in both the male and female mortality rates figures (a-j). 
POLYNOMIAL APPROXIMATION TO MORTALITY DATA 


For the mortality analysis, we consider mortality of different subpopulations such as male, female, rural male, rural female, 
urban male and urban female. We fit mortality values using both lower age limit and upper age limit separately as 
explanatory variable. The fitting for both the cases and for all subpopulations gives good fit. We consider a unique 
polynomial approximation using these two best fit polynomials. This polynomial approximation is good for all 


subpopulations except rural male, urban male and female subpopulations. 
The polynomial regression model in one variable is given by 


My, = eBotBix+ Box? + B3xi+--+ByxP +e 


(1) 
log(my) = Bo + Bix + Box? + Bgx? +++ By x? +e (2) 


Thus the techniques for fitting linear regression model can be used for fitting the polynomial regression model. 


The equation can be expressed in matrix notation as in the case multiple linear regression model 
f=XB+eE (3) 


The order of the polynomial model is kept as low as possible (order should be less than the number of 
observations). A good strategy should be used to choose the order of an approximate polynomial. One possible approach is 
to successively fit the models in increasing order and test the significance of regression coefficients at each step of model 
fitting. Keeping the order increasing until -test for the highest order term is non-significant. This is called as forward 
selection procedure. Another approach is to fit the appropriate highest order model and then delete terms one at a time 


starting with highest order. This is continued until the highest order remaining term has a significant -statistic. This is 
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called as backward elimination procedure. The forward selection and backward elimination procedures do not necessarily 


lead to same model (Shalabh, 2018). In the present paper, we use the forward selection procedure. 


In the mortality data, a lower order polynomial does not provide a good fit. A possible approach to get a good fit 
is to increase the order of the polynomial. The higher order polynomial works well for mortality data with age interval of 
five years and suitable when mortality data for single year is not available. But it will not work especially for data with 
single year age and will not improve the fit significantly. This type of problems can be addressed by fitting an appropriate 
polynomial function in different ranges of explanatory variable. So polynomial regression will be fitted into pieces. The 
spline function can be used for such fitting of regression polynomial in pieces. The piecewise polynomials are called 


splines. We use the polynomial regression of higher order for the analysis mortality data with age intervals. 


In order to have a satisfactory modelling bias, the degree p of the polynomial often has to be large. In the present 
paper, the highest order of polynomial regression model used is six. The problem of multicollinearity in higher order 


polynomials can be reduced by centering the explanatory variable. 


Model (1) can be written as 

log(mz) = Bo + B(x — ¥)* + a(x — x)? + a(x — X)°* + Bala — X)* + Bs(x— ¥)°* + Box — x)? + € 
Or 
f = Bo + By kT + Bokz + BskZ + Bykg + Bgke + Boke + € (4) 


Where f = logm,, kj = max{(x — £)*,0},i = 1,3,5 andk; = (x—X), j = 2,4,6 

Therefore, equation (3) becomes 

f =Ap re (5) 
A basic assumption in linear regression analysis is that the matrix A is of full rank matrix. In polynomial 


regression models, as the order increases, the matrix A Abecomes ill-conditioned and then the matrix (A ‘A i may not be 
accurate and parameters will be estimated with considerable error. The problem of this ill-conditioning is reduced by 
centering the explanatory variable. For fitting mortality data, we consider a best uniform approximating polynomial which 


is unique. The following theorem is taken from Celant and Broniatowski (2016, page 6). 


Theorem: Let f be a continuous function defined on [a, b]. The best uniform approximating polynomial is unique. 


Here, we assume that two such polynomials of best approximation, say P(x) and Q(x) exist. Then 


S(x) = ae (6) 


By the theorem, it is proved that S(x) is best and unique uniform approximating polynomial. Therefore, we 


consider the polynomial S(x)is the best uniform approximation of f. 
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(k) ® 
Figures (k-1): Comparison of Observed and Fitted Values of Age-Specific 
Mortality (in log scales) for (k) Rural Male (Adjusted R’=0.9884) and (1) Rural 
Female (Adjusted R7=0.992) Population During the Year 2012-16. 
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Figures (m-n): Comparison of Observed and Fitted Values of Age-Specific Mortality 

(in log scales) for (m) Indian Rural Total (Adjusted R’=0.9916) and (n) Urban Male 
(Adjusted R7=0.9784) Population during the Year 2012-16. 
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Figures (o-p): Comparison of Observed and Fitted values of Age-Specific 
Mortality (in log scales) for (0) Urban Female (Adjusted R’=0.969) and (p) 
Male (Adjusted R7=0.9739) Population during the Year 2012-16. 
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Figures (q-r): Comparison of Observed and Fitted Values of Age-Specific 
Mortality (in log scales) for (q) Urban Total (Adjusted R’=0.9869) and (r) 
Indian Total (Adjusted R’=0.9904) Population during the Year 2012-16. 
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Figure (s): Comparison of Observed and Fitted 
Values of Age-Specific Mortality (in log scales) 
for Total Indian (Adjusted R7=0.9897) 
Population during the Year 2012-16. 


CONSTRUCTION OF LIFE TABLES 


We first obtained the age-specific death rates (_,/71,,) for each age-group for India from SRS data of India 2012-16. We use 
the values of ,d,, the average number of years lived in the x to x+n age interval by those dying in the interval (see 


Chiang, 1984 and Schoen, 1978). Using the set of values of ,@,.for different age groups of India (2012-16), we 
constructed the life tables for total, males, females including rural and urban populations. For comparisons, the values of 


nf, and eo of subpopulations are given in the Tables 1-3.Chiang noted that the value of the fraction a, depends on the 


: i : : ; n 
mortality pattern over an entire age interval but not on the mortality rate for any single year, we take a, =. When the 


: ; : : ‘ n, . : n, 
mortality rate decreases with age in an interval, the fraction ,a, ae is considered, when reverse prevails ,d, aS is 
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considered. Therefore we considered ,,@,,=0.365i for age group 1-5 and _,d, =0.641i for the age group 15-20, where i is 


the age interval. This is due to the difference in mortality pattern in the above mentioned groups. 
Different columns of the life tables are calculated by using their interrelationships. 

Age group (x, x + n) 
Age interval or period of life between two exact ages stated in years 


— n« Mm, 
ms (n= ,4,)* 4m, ) 


Proportion of persons alive at the beginning of the age interval who die during the age interval 


L=k- (\ * 7 | Of the starting number of newborns in the life table (called the radix of the life table, usually 


set at 100,000) the number living at the beginning of the age interval (or the number surviving to the beginning of the age 


interval) 


ed ee 


n x x 


The number of persons in the cohort who die in the age interval (x, x+ n) 





Number of years of life lived by the cohort within the indicated age interval (x, x+ n) (or person-years of life in 
the age interval) 


Te = Lg + iggy Hot Ln 


Total person-years of life contributed by the cohort after attaining age x 


T; 
eg a= 
ly 


Average number of years of life remaining for a person alive at the beginning of age interval x 


Table 1: Estimated Mortality Rate (,q,) and Life Expectancy (e°) for Total, Female and Male 
Indian Population 
























































Age Total Population Female Population Male Population 
Group nix ef nix ef nix ey 
0-1 0.04014 67.80899 0.04137 69.06721 0.03908 66.63974 
1-5 0.00770 69.62376 0.00922 71.02625 0.00635 68.3296 
5-10 0.00379 66.13884 0.00394 67.65687 0.00364 64.74578 
10-15 0.00315 61.38095 0.00295 62.91461 0.00334 59.97318 
15-20 0.00484 56.56701 0.00489 58.09336 0.00479 55.16578 
20-25 0.00698 51.83312 0.00638 53.36973 0.00757 50.4224 
25-30 0.00777 47.17988 0.00668 48.69636 0.00886 45.78794 
30-35 0.00975 42.52976 0.00742 44.00702 0.01193 41.1749 
35-40 0.01341 37.9239 0.0098 39.31731 0.01691 36.64186 
40-45 0.01834 33.40539 0.01376 34.68169 0.02256 32.22913 
45-50 0.02623 28.98278 0.01879 30.13069 0.03322 27.9153 
50-55 0.04254 24.69614 0.03609 25.6598 1 0.04819 23.7886 
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Table 1 Contd., 























55-60 0.06241 20.6823 1 0.04957 21.52694 0.07601 19.86644 
60-65 0.09228 16.89261 0.08078 17.5193 0.10323 16.29505 
65-70 0.13633 13.35578 0.12197 13.83918 0.15047 12.88305 
70-75 0.20462 10.06936 0.18581 10.41434 0.22384 9.72211 
75-80 0.29153 7.01666 0.26625 7.22051 0.31827 6.80492 
80+ 0.44991 3.87523 0.42664 3.9334 0.47412 3.8147 


























Table 2: Estimated Mortality Rate (,q,) and Life Expectancy (e°) for Indian Total, Female and 

































































Male Rural Population 
Age Total Population Female Population Male Population 
Group nx ef nx ef is ef 
0-1 0.0447 66.61674 0.04575 67.90930 0.04379 65.40127 
1-5 0.00926 68.71044 0.01132 70.14113 0.00747 67.37345 
5-10 0.00419 65.32221 0.00439 66.90719 0.00404 63.85602 
10-15 0.00339 60.58654 0.00315 62.19118 0.00359 59.1049 
15-20 0.00509 55.78413 0.00514 57.37981 0.00504 54.30884 
20-25 0.00752 51.06004 0.00683 52.66668 0.00817 49.57457 
25-30 0.00876 46.42798 0.00777 48.01168 0.0097 44,96234 
30-35 0.01084 41.81619 0.00856 43.36807 0.01297 40.37825 
35-40 0.01475 37.24705 0.01109 38.72092 0.01829 35.87599 
40-45 0.0205 32.76724 0.01593 34.12712 0.02476 31.49781 
45-50 0.02881 28.4007 0.02104 29.63909 0.03617 27.23403 
50-55 0.04762 24.16904 0.0411 25.22237 0.05329 23.16223 
55-60 0.06804 20.25251 0.05299 21.19629 0.08484 19.3253 
60-65 0.10096 16.54858 0.08865 17.24244 0.11282 15.88509 
65-70 0.14446 13.1262 0.12781 13.67649 0.16108 12.58723 
70-75 0.2144 9.92046 0.19275 10.31428 0.23678 9.52407 
75-80 0.29839 6.94559 0.26953 7.18012 0.32898 6.70320 
80+ 0.46549 3.83628 0.4372 3.907 0.49444 3.7639 


























Table 3: Estimated Mortality Rate (,,q,) and Life Expectancy (e°) for Indian Total, Female and 

































































Male Urban Population 
Age Total Population Female Population Male Population 
Group nix ey nix ey nix ey 
0-1 0.0263 70.91088 0.02797 72.11978 0.02483 69.8101 
1-5 0.00303 71.81270 0.00295 73.18063 0.00315 70.57489 
5-10 0.00265 68.02135 0.00275 69.38778 0.0026 66.7879 1 
10-15 0.0025 63.19543 0.00235 64.57223 0.0026 61.95549 
15-20 0.00409 58.34755 0.00414 59.71845 0.00404 57.11048 
20-25 0.00568 53.57959 0.00529 54.95903 0.00608 52.33465 
25-30 0.00573 48.87137 0.00439 50.23801 0.00713 47.6395 
30-35 0.00757 44.13861 0.00504 45.4485 0.00995 42.96366 
35-40 0.01074 39.45622 0.00713 40.66606 0.0141 38.37032 
40-45 0.01391 34.85744 0.00931 35.94014 0.01809 33.88332 
45-50 0.02099 30.31388 0.01421 31.25439 0.02729 29.46151 
50-55 0.03294 25.91021 0.02687 26.66888 0.03844 25.21793 
55-60 0.05085 21.70761 0.04186 22.33623 0.05929 21.12612 
60-65 0.07291 17.73664 0.06267 18.20285 0.08226 17.30006 
65-70 0.11672 13.93491 0.10739 14.25275 0.12557 13.62664 
70-75 0.1804 10.44597 0.16835 10.66672 0.19238 10.22445 
75-80 0.27416 7.19493 0.2579 7.31990 0.29117 7.06446 
80+ 0.41269 3.96828 0.40202 3.99495 0.42423 3.93943 
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CONCLUSIONS 


We use the polynomial regression model to fit mortality data where mortalities are available in age interval. In case of the 
higher order polynomials, the problem of multicollinearity is resolved by centering explanatory variable. We observe from 
the figures that fitting is very good for all the heterogeneous subpopulations except for urban population. Thevalidity of the 
model is done by the adjusted - R42. For all the subpopulations polynomial approximation is the suitable choice of fitting 
model to mortality data. Once the mortality data by age interval is obtained, rest of the columns of life tables are 
constructed using their interrelationships. The fitting for both the cases (lower limit and upper limit) and for all 
subpopulations is good. We consider a unique polynomial approximation using these two best fit polynomials. This 


polynomial approximation is good for all subpopulations except rural male, urban male and female subpopulations. 
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